No automatic sparsity
No flexible user-friendly software
Computationally expensive
Focus on two situations:
In both cases, \(p/n\) is high, which causes problems in terms of overfitting and uninterpretable models.
In models with many parameters, regularization is increasingly used to prevent overfitting.
Regularized SEM adds a penalty to the parameters to regularize, e.g., ridge, lasso, or elastic net
\[ F_{regsem}(S, \Sigma(\theta)) = F(S, \Sigma(\theta)) + \lambda P(\theta_{reg}) \] See for example Jacobucci, Grimm, and McArdle (2016) or Huang (2020).
A shrinkage prior takes the role of the penalty:
\[ posterior \propto likelihood \times prior \]
Ideal shrinkage prior:
Many different shrinkage priors exist (see e.g., Van Erp, Oberski, and Mulder (2019)) and several have been applied in SEM (see Van Erp (2023) for an overview).
In Bayesian regularized SEM, parameters are not automatically set to zero.
MIMIC model drawn with https://semdiag.psychstat.org
Joint work (in progress) with Paul Bürkner and Aki Vehtari.
Goal: Finding a smaller submodel that predicts practically as good as the larger reference model.
See e.g., Piironen and Vehtari (2017), Pavone et al. (2020), Piironen, Paasiniemi, and Vehtari (2020), or McLatchie et al. (2023)
Idea behind posterior projection: replace the reference model posterior \(p(\theta_*|D)\) with a simpler, restricted model \(q_\perp (\theta)\).
Implementation: minimize the KL divergence between the induced predictive distributions:
\[ \theta_\perp = \text{arg min KL }(p(\tilde{y }|\theta_*) || p(\tilde{y }|\theta)) \]
Practical implementation: instead of numerical optimization, for Gaussian linear models an analytical solution is available (Piironen, Paasiniemi, and Vehtari (2020)):
\[ \beta_{proj} = (X^T X)^{-1} X^T \mu_* \] where \(\mu_*\) denotes the predictions based on the reference model and \(X\) the design matrix of the restricted model.
Available in the R-package projpred (Piironen et al. (2023))
library(lavaan)
library(brms)
library(projpred)
mod <- 'F =~ y1 + y2 + y3 + y4 + y5
F ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10'
fit.lavaan <- sem(mod, data = df)
fs <- lavPredict(fit.lavaan, method = "Bartlett")
df$fs <- as.vector(fs)
refm_fit <- brm(fs ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10,
data = df,
prior = prior_hs)
refm_obj <- get_refmodel(refm_fit)
cvvs <- cv_varsel(
refm_obj,
cv_method = "kfold",
K = 10
)
plot(cvvs) suggest_size heuristic in projpred can be influential.projpred seems to perform well especially in very sparse settings.See Karimova et al. (preprint) on the OSF
Idea: Approximate the likelihood with a multivariate normal centered around the MLE.
Step 1: Fit the model and extract the unregularized estimates and error covariance matrix
Step 2: Fit an approximate regularized Bayesian model with a shrinkage prior: \(p(\theta, \lambda|D) \propto N(\theta|\hat{\theta}_{MLE}, \hat{\Sigma}_{\theta}) p(\theta|\lambda) p(\lambda)\)
Implemented in shrinkem.
shrinkem illustration 1 - Restricted factor analysisRestricted factor model for the Holzinger & Swineford (1939) data visualized using semPlot. See also Liang and Jacobucci (2019).
shrinkem illustration 1 - Restricted factor analysislibrary(shrinkem)
# extract MLEs and covariance matrix for regression parameters corresponding to age
mle <- coef(fit_lav)
mleSel <- mle[grep("~age", names(mle))]
covmat <- lavInspect(fit_lav, what = "vcov")
id <- grep("~age", colnames(covmat))
covmatSel <- covmat[id, id]
# shrinkem with horseshoe prior
shrink_hs <- shrinkem(mleSel, covmatSel, type="horseshoe")
summary(shrink_hs)shrinkem illustration 1 - Restricted factor analysis[1] "Start burnin ... "
================================================================================
[1] "Start posterior sampling ... "
================================================================================
Call:
shrinkem.default(x = mleSel, Sigma = covmatSel, type = "horseshoe")
est shrunk.mean shrunk.median shrunk.mode shrunk.lower
t01_visperc~age -0.060 -0.035 -0.035 -0.003 -0.135
t02_cubes~age 0.011 0.006 0.006 0.001 -0.069
t03_frmbord~age 0.066 0.039 0.039 0.003 -0.035
t04_lozenges~age 0.079 0.052 0.052 0.005 -0.026
t05_geninfo~age -0.191 -0.076 -0.076 -0.007 -0.219
t06_paracomp~age -0.214 -0.100 -0.100 -0.085 -0.241
t07_sentcomp~age -0.259 -0.148 -0.148 -0.128 -0.283
t08_wordclas~age -0.271 -0.168 -0.168 -0.162 -0.301
t09_wordmean~age -0.165 -0.052 -0.052 -0.003 -0.188
t10_addition~age 0.159 0.115 0.115 0.117 0.000
t11_code~age 0.034 0.009 0.009 0.000 -0.062
t12_countdot~age 0.242 0.211 0.211 0.214 0.102
t13_sccaps~age 0.092 0.048 0.048 0.004 -0.026
t14_wordrecg~age -0.030 -0.016 -0.016 -0.001 -0.110
t15_numbrecg~age 0.069 0.035 0.035 0.002 -0.037
t16_figrrecg~age -0.096 -0.066 -0.066 -0.008 -0.182
t17_objnumb~age 0.136 0.099 0.099 0.088 -0.005
t18_numbfig~age 0.075 0.041 0.041 0.003 -0.034
t19_figword~age -0.051 -0.026 -0.026 -0.001 -0.129
shrunk.upper nonzero
t01_visperc~age 0.033 0
t02_cubes~age 0.090 0
t03_frmbord~age 0.146 0
t04_lozenges~age 0.163 0
t05_geninfo~age 0.017 0
t06_paracomp~age 0.006 0
t07_sentcomp~age -0.023 1
t08_wordclas~age -0.041 1
t09_wordmean~age 0.033 0
t10_addition~age 0.239 0
t11_code~age 0.094 0
t12_countdot~age 0.321 1
t13_sccaps~age 0.161 0
t14_wordrecg~age 0.056 0
t15_numbrecg~age 0.140 0
t16_figrrecg~age 0.019 0
t17_objnumb~age 0.223 0
t18_numbfig~age 0.156 0
t19_figword~age 0.045 0
| Package | Time needed |
|---|---|
blavaan |
21.5s |
shrinkem |
10.3s |
shrinkem illustration 2 - Mediation analysis
Suppose we wish to regularize the indirect paths \(ab\) directly.
shrinkem illustration 2 - Mediation analysislibrary(shrinkem)
library(blavaan)
# model specification
M <- paste0("M", 1:nM, " ~ ", "X \n")
Y <- c("Y ~ ", paste0("M", 1:nM, sep = " + "), "X \n")
lavmod <- c(M, Y)
# fit unregularized model with blavaan
defaultPriors <- dpriors(beta = "normal(0, 10000)")
fit_blav <- bsem(lavmod,
data = lav_train,
std.lv = TRUE,
dp = defaultPriors)
draws <- as.matrix(blavInspect(fit_blav, what = "mcmc"))
# compute indirect effects
draws_ab <- matrix(NA, nrow = nrow(draws), ncol = nM)
for(i in 1:nM){
draws_a <- draws[, grep(paste0("M", i, "~X"), colnames(draws), fixed = TRUE)]
draws_b <- draws[, grep(paste0("Y~M", i, "$"), colnames(draws))]
draws_ab[, i] <- draws_a * draws_b
}
mle <- colMeans(draws_ab)
covmat <- cov(draws_ab)
# shrinkem with horseshoe prior
shrink_hs <- shrinkem(mle, covmat, type="horseshoe", iterations = 4000)
summary(shrink_hs)shrinkem illustration 2 - Mediation analysisTwo advantages:
- Faster than “exact” MCMC
- More flexible than classical regularization in terms of models (e.g., REMs, SEMs, GGNs) and priors (ridge, (group) lasso, horseshoe)
Two disadvantages:
- Relies on the accuracy of the normal approximation
- Dependent on availability MLEs
shrinkem offers a partial solution, but what if MLEs are not available?
Other approximate Bayesian algorithms might offer a solution.
See tomorrow’s talk (10:20h) on “Posterior Estimators in Highly Dimensional Bayesian Penalised Regression”.
No automatic sparsity
ProjpredSEM
No flexible user-friendly software
shrinkem
Computationally expensive
shrinkem and other approximate algorithms
Feel free to reach out during this conference, or via e-mail: s.j.vanerp@uu.nl.
These slides are available online at: https://github.com/sara-vanerp/presentations
Sara van Erp